In this second vignette, we will be exploring real-world spatial data and an application of spatial statistics from a social science perspective. We will build toward fitting the Gaussian models we’ve studied to account for spatial autocorrelation (SAR, CAR, Lag SAR), and along the way will discuss modeling assumptions, contrast modeling decisions, and eventually compare potential models.
Our research question(s) will differ slightly based on your suggestions, but this vignette will touch on the geography of economic inequality and the legacy of discrimination, all within the Boston area. Specifically, we will model home values as a function of several socioeconomic characteristics at the neighborhood level. Note: We are purposefully not including physical characteristics of homes themselves - only neighborhood and individual characteristics will be used, aggregated at the neighborhood (Census Tract) level. The one exception we consider is the median year in which homes were built for each neighborhood, which we include as an illustrative example of handling missing data.
The file for this vignette combines data from two sources. The primary source is the US Census Bureau’s American Community Survey (ACS) 2021 estimates, which accounts for all but one of our covariates. These data are open-source and can be accessed at data.census.gov. The other source is the University of Richmond’s Digital Scholarship Lab, which has compiled and geo-referenced scans (that is, painstakingly converted images into shapefiles) of historic redlining maps from dozens of mid-20th American cities.
Redlining was a discriminatory process by which the United States government produced a number of maps of urban areas across the country, for use as lending guidelines for banks considering real estate investments. Beginning in 1935 and ending only in 1968 with the passing of the Fair Housing Act (a subsection of the famous Civil Rights Act of 1968), the process takes its name from the literal red lines drawn around neighborhoods deemed “most hazardous” or “undesirable”, which were disproportionately minority neighborhoods.
With this designation, prospective homeowners from redlined neighborhoods were routinely turned down for loans and unable to build equity. Decades of discriminatory lending have entrenched a racial gap in homeownership and wealth between white and non-white Americans, on average, which can be easily observed through open-source Census data, which we will be working with today.
Importantly, redlining was not the beginning of housing discrimination in the United States - racially restrictive deeds were regularly employed without the direction of redlining maps, for instance - but because the maps are so plentiful and so well preserved it is one of the most visible, from a historical perspective. We will use an indicator in our spatial linear models to include the proportion of each neighborhood, by area, which was given this “most hazardous” designation. When controlling for other important covariates (as we will), redlining may be significant but not the driving predictor of home values. But the fact that a simple indicator from a nearly 100 year old map is related to so many salient socioeconomic characteristics of modern life speaks to the complicated, interconnected nature of urban geography.